Approximate K Nearest Neighbors in High Dimensions
نویسندگان
چکیده
Given a set P of N points in a ddimensional space, along with a query point q, it is often desirable to find k points of P that are with high probability close to q. This is the Approximate k-NearestNeighbors problem. We present two algorithms for AkNN. Both require O(Nd) preprocessing time. The first algorithm has a query time cost that is O(d+logN), while the second has a query time cost that is O(d). Both algorithms create an undirected graph on the points of P by adding edges to a linked list storing P in Hilbert order. To find approximate nearest neighbors of a query point, both algorithms perform bestfirst search on this graph. The first algorithm uses standard one dimensional indexing structures to find starting points on the graph for this search, whereas the second algorithm using random starting points. Despite the quadratic preprocessing time, our algorithms have the potential to be useful in machine learning applications where the number of query points that need to be processed is large compared to the number of points in P . The linear dependence in d of the preprocessing and query time costs of our algorithms allows them to remain effective even when dealing with high-
منابع مشابه
An Efficient Searching Algorithm for Approximate Nearest Neighbor Queries in High Dimensions
In this papel; we present an approximate nearest neighbor search algorithm that use heuristics to decide whether o r not to access a node in the index tree based on three interesting data distribution properties. We demonstrate that the proposed algorithm significantly reduces the number of nodes accessed over the algorithms that have been proposed in earlier works. Also, it will be demonstrate...
متن کاملQuantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning
We quantitatively analyze the performance of exact and approximate nearest-neighbors algorithms on increasingly high-dimensional problems in the context of sampling-based motion planning. We study the impact of the dimension, number of samples, distance metrics, and sampling schemes on the efficiency and accuracy of nearest-neighbors algorithms. Efficiency measures computation time and accuracy...
متن کاملImplementing a Parallel Dynamic Approximate Nearest Neighbor Search Algorithm∗
We describe the implementation of a fast, dynamic, approximate, nearest-neighbor search algorithm that works well in fixed dimensions (d ≤ 5), based on sorting points coordinates in Morton (or z-) ordering. Our code scales well on multi-core/cpu shared memory systems. Our implementation is competitive with the best approximate nearest neighbor searching codes available on the web, especially fo...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملHigh-Dimensional Similarity Search Using Data-Sensitive Space Partitioning
Nearest neighbor search has a wide variety of applications. Unfortunately, the majority of search methods do not scale well with dimensionality. Recent efforts have been focused on finding better approximate solutions that improve the locality of data using dimensionality reduction. However, it is possible to preserve the locality of data and find exact nearest neighbors in high dimensions with...
متن کامل